{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 32bit AES\n",
    "\n",
    "Supported setups:\n",
    "\n",
    "SCOPES:\n",
    "\n",
    "* OPENADC\n",
    "* CWNANO\n",
    "\n",
    "PLATFORMS:\n",
    "\n",
    "* CWLITEARM\n",
    "* CWNANO"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Previous AES tutorials (even on 32-bit targets) ran 8-bit modes of operation. We can target typical implementation on Arm devices which actually looks a little different.\n",
    "\n",
    "This tutorial is ONLY possible if you have an ARM target. For example the CWLite Arm target or the UFO Board with an STM32F target (or similar)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "SCOPETYPE = 'OPENADC'\n",
    "PLATFORM = 'CWLITEARM'\n",
    "N = 1500\n",
    "CHECK_CORR = False"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Background"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A 32-bit machine can operate on 32-bit words, so it seems wasteful to use the same 8-bit operations. Indeed we can speed up the AES operation considerably by generating several tables (called T-Tables), as was described in the book [The Design of Rijndael](http://www.springer.com/gp/book/9783540425809) which was published by the authors of AES.\n",
    "\n",
    "In order to take advantage of our 32 bit machine, we can examine a typical round of AES. With the exception of the final round, each round looks like:\n",
    "\n",
    "$\\text{a = Round Input}$\n",
    "\n",
    "$\\text{b = SubBytes(a)}$\n",
    "\n",
    "$\\text{c = ShiftRows(b)}$\n",
    "\n",
    "$\\text{d = MixColumns(c)}$\n",
    "\n",
    "$\\text{a' = AddRoundKey(d) = Round Output}$\n",
    "\n",
    "We'll leave AddRoundKey the way it is. The other operations are:\n",
    "\n",
    "$b_{i,j} = \\text{sbox}[a_{i,j}]$\n",
    "\n",
    "$\\left[ \\begin{array} { c } { c _ { 0 , j } } \\\\ { c _ { 1 , j } } \\\\ { c _ { 2 , j } } \\\\ { c _ { 3 , j } } \\end{array} \\right] = \\left[ \\begin{array} { l } { b _ { 0 , j + 0 } } \\\\ { b _ { 1 , j + 1 } } \\\\ { b _ { 2 , j + 2 } } \\\\ { b _ { 3 , j + 3 } } \\end{array} \\right]$\n",
    "\n",
    "$\\left[ \\begin{array} { l } { d _ { 0 , j } } \\\\ { d _ { 1 , j } } \\\\ { d _ { 2 , j } } \\\\ { d _ { 3 , j } } \\end{array} \\right] = \\left[ \\begin{array} { l l l l } { 02 } & { 03 } & { 01 } & { 01 } \\\\ { 01 } & { 02 } & { 03 } & { 01 } \\\\ { 01 } & { 01 } & { 02 } & { 03 } \\\\ { 03 } & { 01 } & { 01 } & { 02 } \\end{array} \\right] \\times \\left[ \\begin{array} { c } { c _ { 0 , j } } \\\\ { c _ { 1 , j } } \\\\ { c _ { 2 , j } } \\\\ { c _ { 3 , j } } \\end{array} \\right]$\n",
    "\n",
    "Note that the ShiftRows operation $b_{i, j+c}$ is a cyclic shift and the matrix multiplcation in MixColumns denotes the xtime operation in GF($2^8$).\n",
    "\n",
    "It's possible to combine all three of these operations into a single line. We can write 4 bytes of $d$ as the linear combination of four different 4 byte vectors:\n",
    "\n",
    "$\\left[ \\begin{array} { l } { d _ { 0 , j } } \\\\ { d _ { 1 , j } } \\\\ { d _ { 2 , j } } \\\\ { d _ { 3 , j } } \\end{array} \\right] = \\left[ \\begin{array} { l } { 02 } \\\\ { 01 } \\\\ { 01 } \\\\ { 03 } \\end{array} \\right] \\operatorname { sbox } \\left[ a _ { 0 , j + 0 } \\right] \\oplus \\left[ \\begin{array} { l } { 03 } \\\\ { 02 } \\\\ { 01 } \\\\ { 01 } \\end{array} \\right] \\operatorname { sbox } \\left[ a _ { 1 , j + 1 } \\right] \\oplus \\left[ \\begin{array} { c } { 01 } \\\\ { 03 } \\\\ { 02 } \\\\ { 01 } \\end{array} \\right] \\operatorname { sbox } \\left[ a _ { 2 , j + 2 } \\right] \\oplus \\left[ \\begin{array} { c } { 01 } \\\\ { 01 } \\\\ { 03 } \\\\ { 02 } \\end{array} \\right] \\operatorname { sbox } \\left[ a _ { 3 , j + 3 } \\right]$\n",
    "\n",
    "Now, for each of these four components, we can tabulate the outputs for every possible 8-bit input:\n",
    "\n",
    "$T _ { 0 } [ a ] = \\left[ \\begin{array} { l l } { 02 \\times \\operatorname { sbox } [ a ] } \\\\ { 01 \\times \\operatorname { sbox } [ a ] } \\\\ { 01 \\times \\operatorname { sbox } [ a ] } \\\\ { 03 \\times \\operatorname { sbox } [ a ] } \\end{array} \\right]$\n",
    "\n",
    "$T _ { 1 } [ a ] = \\left[ \\begin{array} { l } { 03 \\times \\operatorname { sbox } [ a ] } \\\\ { 02 \\times \\operatorname { sbox } [ a ] } \\\\ { 01 \\times \\operatorname { sbox } [ a ] } \\\\ { 01 \\times \\operatorname { sbox } [ a ] } \\end{array} \\right]$\n",
    "\n",
    "$T _ { 2 } [ a ] = \\left[ \\begin{array} { l l } { 01 \\times \\operatorname { sbox } [ a ] } \\\\ { 03 \\times \\operatorname { sbox } [ a ] } \\\\ { 02 \\times \\operatorname { sbox } [ a ] } \\\\ { 01 \\times \\operatorname { sbox } [ a ] } \\end{array} \\right]$\n",
    "\n",
    "$T _ { 3 } [ a ] = \\left[ \\begin{array} { l l } { 01 \\times \\operatorname { sbox } [ a ] } \\\\ { 01 \\times \\operatorname { sbox } [ a ] } \\\\ { 03 \\times \\operatorname { sbox } [ a ] } \\\\ { 02 \\times \\operatorname { sbox } [ a ] } \\end{array} \\right]$\n",
    "\n",
    "These tables have 2^8 different 32-bit entries, so together the tables take up 4 kB. Finally, we can quickly compute one round of AES by calculating\n",
    "\n",
    "$\\left[ \\begin{array} { l } { d _ { 0 , j } } \\\\ { d _ { 1 , j } } \\\\ { d _ { 2 , j } } \\\\ { d _ { 3 , j } } \\end{array} \\right] = T _ { 0 } \\left[ a _ { 0 } , j + 0 \\right] \\oplus T _ { 1 } \\left[ a _ { 1 } , j + 1 \\right] \\oplus T _ { 2 } \\left[ a _ { 2 } , j + 2 \\right] \\oplus T _ { 3 } \\left[ a _ { 3 } , j + 3 \\right]$\n",
    "\n",
    "All together, with AddRoundKey at the end, a single round now takes 16 table lookups and 16 32-bit XOR operations. This arrangement is much more efficient than the traditional 8-bit implementation. There are a few more tradeoffs that can be made: for instance, the tables only differ by 8-bit shifts, so it's also possible to store only 1 kB of lookup tables at the expense of a few rotate operations.\n",
    "\n",
    "Note that T-tables don't have a big effect on AES from a side-channel analysis perspective. The SubBytes output is still buried in the T-tables and the other operations are linear, so it's still possible to attack 32-bit AES using the same 8-bit attack methods."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Firmware"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Firmware is the same as previous AES examples, except this time we'll need to build it using MBEDTLS (previous examples used TINYAES128C):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "cd ../hardware/victims/firmware/\n",
    "mkdir -p simpleserial-aes-lab2 && cp -r simpleserial-aes/* $_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "CRYPTO_TARGET = \"MBEDTLS\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash -s \"$PLATFORM\" \"$CRYPTO_TARGET\"\n",
    "cd ../hardware/victims/firmware/simpleserial-aes-lab2\n",
    "make PLATFORM=$1 CRYPTO_TARGET=$2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Running the Attack"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Neither the attack, nor the analysis is any different from a normal AES attack as we don't actually care about the T-Table implementation. We will need to capture more traces (500), but our traces also don't have to be as long (only 1500 samples)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Capturing Traces"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%run \"Helper_Scripts/Setup_Generic.ipynb\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fw_path = \"../hardware/victims/firmware/simpleserial-aes-lab2/simpleserial-aes-{}.hex\".format(PLATFORM)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cw.program_target(scope, prog, fw_path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "project = cw.create_project(\"projects/AES32.cwp\", overwrite = True)\n",
    "\n",
    "#Capture Traces\n",
    "from tqdm import tnrange\n",
    "import numpy as np\n",
    "import time\n",
    "\n",
    "ktp = cw.ktp.Basic()\n",
    "scope.adc.samples = 1500\n",
    "\n",
    "for i in tnrange(N, desc='Capturing traces'):\n",
    "    key, text = ktp.next()  # manual creation of a key, text pair can be substituted here\n",
    "    trace = cw.capture_trace(scope, target, text, key)\n",
    "    if trace is None:\n",
    "        continue\n",
    "    project.traces.append(trace)\n",
    "project.save()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# cleanup the connection to the target and scope\n",
    "scope.dis()\n",
    "target.dis()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import chipwhisperer.analyzer as cwa\n",
    "leak_model = cwa.leakage_models.sbox_output\n",
    "attack = cwa.cpa(project, leak_model)\n",
    "\n",
    "cb = cwa.get_jupyter_callback(attack)   \n",
    "attack_results = attack.run(cb, 100)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With that, you should see red numbers fill the top of the table."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Plots"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We'll skip most of the plots this time around. We will, however, take a look at output vs. time:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from bokeh.plotting import figure, show\n",
    "from bokeh.io import output_notebook\n",
    "plot_data = cwa.analyzer_plots(attack_results)\n",
    "\n",
    "output_notebook()\n",
    "rets = []\n",
    "for i in range(0, 16):\n",
    "    rets.append(plot_data.output_vs_time(i))\n",
    "\n",
    "p = figure()\n",
    "for ret in rets:\n",
    "    p.line(ret[0], ret[2], line_color='green')\n",
    "    p.line(ret[0], ret[3], line_color='green')\n",
    "    \n",
    "for ret in rets:\n",
    "    p.line(ret[0], ret[1], line_color='red')\n",
    "\n",
    "show(p)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Zooming in, you should see that the resulting plot looks \"messier\" than the 8-bit implementation:\n",
    "\n",
    "![](https://wiki.newae.com/images/e/e3/32bit_AES_outvstime.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Tests"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "key = np.array(project.keys[0])\n",
    "recv_key = [kguess[0][0] for kguess in attack_results.find_maximums()]\n",
    "assert np.all((key == recv_key)), \"Failed to recover encryption key\\nGot: {}\\nExpected: {}\".format(recv_key, key)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "assert (attack_results.pge == [0]*16), \"PGE for some bytes not zero: {}\".format(attack_results.pge)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# MBED Often has a few bytes with low (0.2-0.4) correlation, causing it to fail this test\n",
    "if CHECK_CORR:\n",
    "    max_corrs = [kguess[0][2] for kguess in attack_results.find_maximums()]\n",
    "    assert (np.all([corr > 0.6 for corr in max_corrs])), \"Low correlation in attack (corr <= 0.6): {}\".format(max_corrs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
